Module 11: Interactive Graphics with plotly
The Quarto file used to generate the html file can be obtained by clicking on the Code Links beneath the Table of Contents which will open the Math 3376 Posit Cloud Project where you can open the file 11-Interactive-Graphics-with-plotly.qmd.
Best R Packages for Creating Interactive Graphics
Interactive graphics are visual displays that dynamically provide information to users based on the user interacting with the graphic.
The best packages for creating interactive graphics in R are:
plotly: provides functions to makeggplot2graphics interactive and a custom interface to the JavaScript libraryplotly.jsinspired by the grammar of graphics. This is perhaps the best known interactive visualization library.ggiraph: creates interactiveggplot2graphics usinghtmlwidgets.rgl: provides functions for 3D interactive graphics using OpenGL or to various standard 3D file formatsshiny: a package for creating interactive web apps.
An Example with plotly
We use plotly to create a surface plot of the Maunga Whau volcano data.
An Example with ggiraph
We use ggiraph to provide an example of an interactive graphic using the starwars data from the dplyr package.
An Example with rgl
We use rgl to display 3-dimensional perspective plot with contour levels of the Maunga Whau volcano. A 3-dimensional interactive graphic will be created in the rendered html file.
- Running the single code cell below in RStudio in Posit Cloud will not display an image in the output generated.
- Running the code cell below in RStudio (downloaded on computer), may generate the interactive graphic.
- On a Mac user, it is possible you may need to download and install XQuartz at https://www.xquartz.org/ to generate the interactive graphic when running the single code cell.
glX
1
Examples of Shiny Apps
A Shiny app created for MATH 3382: Statistical Theory to play around with sampling distributions and form conjectures about the Central Limit Theory can be found at https://adamspiegler.shinyapps.io/clt_quake/.
A Shiny app example related to NCAA swim teams can be found at https://shiny.rstudio.com/gallery/ncaa-swim-team-finder.html.
Creating Interactive Plots with plotly
As stated on the plotly website (https://plotly.com/r/getting-started/):
plotly is an R package for creating interactive web-based graphs via the open source JavaScript graphing library plotly.js.
It can be used to add interactivity to plots created with ggplot2 or create interactive plots on its own. Thus, the two approaches for creating interactive graphics with plotly are:
- Create a plot using
ggplot2and use theggplotly()function from theplotlypackage to make the graphic interactive. - Create an interactive plot with the
plot_ly()function fromplotly(and notggplot2).
Bar Plots
We can create an interactive bar plot using the two different methods outlined above.
Creating Interactive Bar Plots Using ggplot2 and ggplotly()
The easiest way to create interactive graphics is to:
- Create a plot using
ggplot2. - Use the
ggplotlyfunction from theplotlypackage to make the graphic interactive.
The advantage of this approach is that it builds off our prior knowledge of working with with ggplot2.
The disadvantage of this approach is that it may not give us the desired control over the aspects of the graphic that are interactive.
First, we load the necessary packages. We load the core tidyverse packages (which includes ggplot2 and dplyr) and load the plotly package to create the graphics and load the penguins data set from the palmerpenguins package to import the data we will plot.
library(tidyverse, quietly = TRUE)
library(plotly, quietly = TRUE)
data(penguins, package = "palmerpenguins")Question 1
Use ggplot2 to create a basic bar plot of penguin species.
Solution to Question 1
Insert code cell to solve question.
Converting a Static ggplot to an Interactive Plot
In the code below, we use ggplot2 to create a basic bar plot of penguin species. We assign this graphic the name ggbar.
We then use the ggplotly function in the plotly package to make the graphic interactive.
The interactive graphic provides the frequency associated with each species when we hover over a bar.
# bar plot of penguin species
ggbar <-
ggplot(penguins, aes(x = species)) +
geom_bar()
# make bar plot interactive
ggplotly(ggbar)Creating Interactive Bar Plots Using plot_ly()
Next, we use the direct capabilities of the plotly package to create a bar plot of penguin species.
In general, the plot_ly() function in the plotly package is all we need to create basic interactive graphics.
We can also add additional layers to the graphics using various add_*() functions and customize the layout using the layout() functions.
The main arguments to the plot_ly() function are:
data: an optional data frame whose variables will be plotted. To access a variable indata, we must use~before the variable’s name.type: a character string indicating the type of plot to create, e.g.,"bar","histogram","box","violin","scatter",...: arguments passed to the plot type that specify the attributes of the graphic (which is similar to the aesthetics inggplot2), e.g.,x,y, etc.split: Discrete values used to create multiple traces (one trace per value). This is similar to thegroupargument inggplot2. A “trace” describes “a single series of data in a graph” (https://plotly.com/r/reference/index/).color: values mapped to a fill color.alpha: a number between 0 and 1 controlling the transparency of the graphic.
To create a bar plot using plotly, we need a data frame containing the count associated with each level of the categorical variable we want to display.
Question 2
Create a data frame that summarizes the counts of each species using the group_by(), summarize(), and n() functions from dplyr to do this.
Solution to Question 2
Insert code cell to solve question.
Using a Frequency Table to Construct an Interactive Bar Plot
Once we have a data frame that describes the frequency associated with each level of the categorical variable, we can create a bar plot using plotly. In the plot_ly() function, we:
- Set the
typeargument tobar - Associate the levels of the categorical variable with the
xattribute. - Associate the frequency of each level with the
yattribute.
The interactive graphic provides the frequency associated with each species when we hover over a bar.
# create interactive bar chart
plot_ly(species_counts,
x = ~species,
y = ~frequency,
type = "bar")Histograms
We now create interactive histograms using the same two approaches.
Creating Interactive Histograms Using ggplot2 and ggplotly()
In the code below, we:
- Use
ggplot2to create a basic histogram of thebill_length_mmvariable for thepenguinsdata. We assign this graphic the namegghist. - Use
plotly::ggplotlyto make the graphic interactive.
The interactive graphic indicates the midpoint of each bin and the number of penguins falling in each bin.
Question 3
Create a basic histogram to display the distribution of bill_length_mm. Assign the histogram to an object named gghist. Then make the plot interactive with ggplotly().
Solution to Question 3
Insert code cell to solve question.
Creating Interactive Histograms Using plot_ly()
To create a similar histogram using the plot_ly function, we:
- Set the
typeargument tohistogram. - Associate
bill_length_mmwith thexattribute. - Set the
nbinsxargument to control the number of bins in the histogram.
The interactive histogram indicates the endpoints of each bin and the number of penguins in each bin.
plot_ly(penguins, # data
x = ~bill_length_mm, # x attribute
type = "histogram", # create histogram
nbinsx = 30) # set the number of bins to 30The histogram produced by the plot_ly function looks a bit different from the histogram produced by ggplot2 because the locations of the bins are different. To make them the safe, we can use the ggplot2::layer_data to get the “under the hood” information ggplot2 uses to produce its plot.
In the code below, we use layer_data to access the internal data used by ggplot2 to create a histogram. The xmin variable indicates the lower bound of each histogram bin. The lower bound of the far left bin starts at 31.72724. We then use the diff function to determine the bin width (this computes the difference between success lower bounds).
# get histogram data from gghist
datahist <- layer_data(gghist)
# determine starting point
head(datahist$xmin, 3)[1] 31.76724 32.71552 33.66379
# determine bin width
head(diff(datahist$xmin), 3)[1] 0.9482759 0.9482759 0.9482759
Now that we know the start location of the left most bin and the size (width) of the bins, we can pass these arguments as start and size arguments to a named list for the xbins argument to plot_ly. This will create an interactive histogram that mimics the one produced by ggplot2.
plot_ly(penguins,
x = ~bill_length_mm,
type = "histogram",
xbins = list(start = 31.76724,
size = 0.9482759))Density Plots
We examine how to construct interactive density plots using two approaches.
Creating Interactive Density Plots Using ggplot2 and ggplotly()
In the code below, we:
- Use
ggplot2to create a density plot ofbill_length_mmfor eachspeciesthat uses semi-transparent color to distinguish the differentspecies. We assign this plot the nameggdens. - Use
plotly::ggplotlyto make the graphic interactive.
The interactive graphic indicates the species, bill_length_mm, and density when we hover over a density curve.
ggdens <-
ggplot(penguins, aes(x = bill_length_mm, fill = species)) +
geom_density(alpha = 0.3)
ggplotly(ggdens)Creating Interactive Density Plots Using plot_ly()
Surprisingly, there is no easy way to create a standard density plot natively using plot_ly.
To work around this, we can manually create the density information using the base::density function and then extract the associated x and y of the density information. However, we want to do this individually for each species, so we instead use the ggplot2::layer_data function to get the same information from our previous ggplot2 graphic.
We assign the name dens_data to the information from the layer_data function. dens_data stores the relevant density curve information in the x and density variables, while the group variable distinguishes the different species. We turn the group variable into a factor with the correct species names.
# extract density data from ggdens
dens_data <- layer_data(ggdens)
# view data
head(dens_data, n = 3) fill y x density scaled ndensity count n
1 #F8766D 0.006271601 32.10000 0.006271601 0.04822972 0.04822972 0.9470118 151
2 #F8766D 0.006598292 32.15382 0.006598292 0.05074203 0.05074203 0.9963421 151
3 #F8766D 0.006937409 32.20763 0.006937409 0.05334990 0.05334990 1.0475488 151
flipped_aes PANEL group ymin ymax weight colour alpha linewidth
1 FALSE 1 1 0 0.006271601 1 black 0.3 0.5
2 FALSE 1 1 0 0.006598292 1 black 0.3 0.5
3 FALSE 1 1 0 0.006937409 1 black 0.3 0.5
linetype
1 1
2 1
3 1
# convert group to factor
dens_data$group <-
factor(dens_data$group,
labels = c("adelie", "chinstrap", "gentoo"))We want to trace the density curves for each species in a scatter plot that connects the points for each species. Using plot_ly, and the dens_data data frame, we:
- Associate the
xvariable with thexattribute. - Associate the
densityvariable with theyattribute. - Associate the
groupvariable with thesplitargument. This is roughly equivalent to thegroupaesthetic inggplot2. - Specify
type = "scatter"to produce a scatter plot. - Specify
mode = "line"to connect the points using a line but not show the points (markers) themselves. - Combine the native R pipe,
|>with thelayoutfunction to change the x-axis label.
The resulting interactive density plot indicates the value of bill_length_mm for each species and the associated density.
plot_ly(dens_data,
x = ~x,
y = ~density,
split = ~group,
type = "scatter",
mode = "line") |>
layout(xaxis = list(title = 'bill_length_mm'))Box Plots
We create interactive box plots using two approaches.
Creating Interactive Box Plots Using ggplot2 and ggplotly()
In the code below, we:
- Use
ggplot2to create a box plot ofbill_length_mmfor eachspecies. We associatespecieswith the x-variable andbill_length_mmwith the y-variable. We assign this plot the nameggbox. - Use
plotly::ggplotlyto make the graphic interactive.
The interactive graphic indicates the 5-number summary (min, Q1, median, Q3, max) of bill_length_mm for each species. It also indicates the value of any outlier.
ggbox <-
ggplot(penguins, aes(x = species, y = bill_length_mm)) +
geom_boxplot()
ggplotly(ggbox)Creating Interactive Box Plots Using plot_ly()
We can create a similar set of parallel box plots using plotly. Using the plot_ly function, we:
- Associating
specieswith thexattribute - Associate
bill_length_mmwith theyattribute
- Specify
type = "box".
The interactive graphic indicates the species and 5-number summary (min, Q1, median, Q3, max) of bill_length_mm for each box plot.
plot_ly(penguins,
x = ~species,
y = ~bill_length_mm,
type = "box")Violin Plots
We use two approaches to create interactive violin plots.
Creating Interactive Violin Plots Using ggplot2 and ggplotly()
In the code below, we:
- Use
ggplot2to create a violin plot ofbill_length_mmfor eachspecies. We assign this plot the nameggvio. - Use
plotly::ggplotlyto make the graphic interactive.
The interactive graphic indicates the species, bill_length_mm, and the associated density for that value of bill_length_mm when we hover over a violin curve.
ggvio <-
ggplot(penguins, aes(x = species, y = bill_length_mm)) +
geom_violin()
ggplotly(ggvio)Creating Interactive Violin Plots Using plot_ly()
To create a similar plot, using the plot_ly function we:
- Associate
specieswith thexattribute - Associate
bill_length_mmwith theyattributes - Specify
type = "violin".
The interactive graphic indicates the species, bill_length_mm, and the associated density for that value of bill_length_mm when we hover over a violin curve, as well as the 5-number summary of the associated box plot.
plot_ly(penguins,
x = ~species,
y = ~bill_length_mm,
type = "violin")Scatter Plots
We will create an interactive scatter plot of bill_length_mm versus body_mass_g for the penguins data that uses different colors and shapes to distinguish the different species.
We will investigate two simple approaches for doing this.
Creating Interactive Scatter Plots Using ggplot2 and ggplotly()
In the code below, we:
- Use
ggplot2to create the grouped scatter plot. We assign this plot the nameggscatter. - Use
plotly::ggplotlyto make the graphic interactive.
The interactive graphic indicates the species (twice), body_mass_g, and bill_length_mm when we hover over a density curve.
ggscatter <-
ggplot(penguins,
aes(x = body_mass_g,
y = bill_length_mm,
color = species,
shape = species)) +
geom_point()
ggplotly(ggscatter)Notice that species is indicated twice when we hover over a point. We can correct this behavior by using the tooltip argument to specify the attributes (x, y, color, etc.) we want to display when our mouse hovers over a point.
# restrict attributes displayed from hover
ggplotly(ggscatter,
tooltip = c("shape", "x", "y"))Creating Interactive Scatter Plots Using plot_ly()
We can create a similar interactive scatter plot using plot_ly: We:
- Specify
type = "scatter"andmode = "marker"to indicate that we want to plot points. - Associate
body_mass_gwith thexattribute andbill_length_m`` with they` attribute for the actual points. - Associate
specieswith thecolorandsymbolattributes to change those aspects of the plot.
The resulting scatter plot indicates the species, body_mass_g, and bill_length_mm of each point.
plot_ly(penguins,
x = ~body_mass_g,
y = ~bill_length_mm,
color = ~species,
symbol = ~species,
mode = "markers",
type = "scatter")Scatter Plots with Smooths
We now attempt to add some linear regression smooths to an interactive scatter plot.
Adding Smooths to an Interactive Scatter Plot Using ggplot2 and ggplotly()
In the code below, we:
- Use
ggplot2to create a scatter plot ofbill_length_mmversusbody_mass_gthat uses different colors and shapes to distinguish the differentspecies. - Add a second layer to the plot the provides an
"lm"smooth for the points of eachspecies. We assign this plot the nameggsmooth. - Use
plotly::ggplotlyto make the graphic interactive.
The interactive graphic indicates the species, body_mass_g, and bill_length_mm of each point, points on the .
ggsmooth <-
ggplot(penguins,
aes(x = body_mass_g,
y = bill_length_mm,
color = species,
shape = species)) +
geom_point() +
geom_smooth(method = "lm")
ggplotly(ggsmooth)Adding Smooths to an Interactive Scatter Plot Using plot_ly()
Adding a smooth to a plot using plotly natively is a bit more difficult because you have to manually compute the smooth, extract the fitted line for each group, and then add the fitted lines as a layer to an existing scatter plot.
In the code below, we fit a separate lines model for the data, which essentially fits a separate liner regression model to the points of each species. We then add the fitted values from this model as a new variable to the penguins data frame.
# fit separate lines/interaction model
lmod <- lm(bill_length_mm ~ body_mass_g + species,
data = penguins, na.action = na.exclude)
# add fitted values for each group
penguins$fitted <- fitted(lmod)Now that all relevant data is in the penguins data frame, we:
- Use the same syntax as before to create a scatter plot of
bill_length_mmversusbody_mass_gthat uses different colors and shapes for the points of each species. - Use the native R pipe operator to add a “trace” to the original plot using
add_lines.- We supply the
penguinsdata frame toadd_lines(make sure to specifydata =since that is not the first argument of theadd_linesfunction). - Associate
body_mass_gwith thexattribute - Associate
fittedwith theyattribute. - Change the color of the lines by associated
specieswith thecolorattribute. - Specify
inherit = FALSE, which means we are not inheriting any of the attribute specifications in theplot_lyfunction (which are otherwise passed by default).
- We supply the
The interactive scatter plot allow us to see the species, bill_length_mm, and body_mass_g for each point and the points on the smoother line associated with each species.
plot_ly(penguins,
x = ~body_mass_g,
y = ~bill_length_mm,
mode = 'markers',
color = ~species,
symbol = ~species,
type = 'scatter') |>
add_lines(data = penguins,
x = ~body_mass_g,
y = ~fitted,
color = ~species,
inherit = FALSE)Interactive Maps
Creating Interactive Scatter Plots Using ggplot2 and ggplotly()
Interactive maps can provide a lot of information. We will create an interactive map using ggplot2, plotly, and the sf package.
Using the Special Features (sf) Package
First, we use the st_read function from the sf package to read a shapefile related to North Carolina packages that is installed by default with the sf package. The imported shapefile is automatically converted to an sf data frame. The imported object has many variables, but we point out three:
NAME: the name of each North Carolina countyBIR74: the number of recorded births in each county in 1974.geometry: theMULTIPOLYGONassociated with each North Carolina county.
library(sf, quietly = TRUE)# import sf object from shapefile in sf package
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
# display first 3 rows of nc for certain variables
head(nc[c("NAME", "BIR74", "geometry")], n = 3)Simple feature collection with 3 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -81.74107 ymin: 36.23388 xmax: -80.43531 ymax: 36.58965
Geodetic CRS: NAD27
NAME BIR74 geometry
1 Ashe 1091 MULTIPOLYGON (((-81.47276 3...
2 Alleghany 487 MULTIPOLYGON (((-81.23989 3...
3 Surry 3188 MULTIPOLYGON (((-80.45634 3...
In the code below, we:
- Use
ggplot2to create a choropleth map ofBIR74for each county usinggeom_sf.- We specify
fill = BIR74so that the fill color of each county is based on theBIR74variable. - We also associate the
NAMEvariable with thelabelaesthetic so that the name of each county is displayed when we hover over a county. - Use
scale_fill_viridis_cto change the color palette used for the fill color. - We assign this plot the name
ggsf.
- We specify
- Use
plotly::ggplotlyto make the graphic interactive.
The interactive graphic indicates the number of births in each county and the county name when we hover over a county.
# plot sf object using ggplot2
ggsf <-
ggplot(nc) +
geom_sf(aes(fill = BIR74, label = NAME)) +
scale_fill_viridis_c()
# make map interactive
ggplotly(ggsf)Is there a way to provide information from multiple variable simultaneously when we hover over a county? Yes! But we have to be creative. We:
- Use the
paste0function to create a new variable,info, that combines multiple variables into a single character string for each county. The\nindicates to start a new line. We add a new line before each variable name. - Add the
infovariable as a variable to thencdata frame.
# combine multiple variables into a character string
# (one per county)
info <- paste0(
"\nname: ", nc$NAME,
"\narea: ", nc$AREA,
"\nbirths in 1974: ", nc$BIR74,
"\nSIDS cases in 1974: ", nc$SID74)
# print first 2 values of info
info[1:2][1] "\nname: Ashe\narea: 0.114\nbirths in 1974: 1091\nSIDS cases in 1974: 1"
[2] "\nname: Alleghany\narea: 0.061\nbirths in 1974: 487\nSIDS cases in 1974: 0"
# add info the nc
nc$info <- infoNow, we use info as the label aesthetic in geom_sf and specify tooltip = "label" so that only the label variable is displayed when we hover over a county.
# create map that fills based on BIR74 but the tooltip
# based on info
ggsf <-
ggplot(nc) +
geom_sf(aes(fill = BIR74, label = info)) +
scale_fill_viridis_c()
# show only label tooltip
ggplotly(ggsf, tooltip = "label")Creating Interactive Maps Using plot_ly()
We can create a similar plot using plot_ly. We:
- Specify
type = "scatter"andmode = "lines". - Associate the
infovariable inncwith thesplitattribute to draw the separate traces for each county. We could have usedNAME, but then only theNAMEof each county would be displayed when we hover. This way, we get additional information. - Associate the
BIR74variable inncwith thecolorattribute to fill each county with a color from a gradient. - Specify
showlegend = FALSEso that only the color scale is displayed and no legend related toinfo. This is a critical step. - Specify
alpha = 1so that the colors aren’t muted. - Specify
hoverinfo = "text"so the only thesplitinformation is displayed - Pipe this graphic into the
colorbarfunction and change the title to “BIR74” (otherwise it gets displayed twice).
plot_ly(nc,
color = ~BIR74,
split = ~info,
showlegend = FALSE,
alpha = 1,
type = "scatter",
mode = "lines",
hoverinfo = "text") |>
colorbar(title = "BIR74")